NSF PAR Search | NSF Public Access Repository

Distributed VLMs: Efficient vision-language processing through cloud-edge collaboration

Li, Y; Gumaste, D; Turkcan, M; Ghaderi, LJ; Zussman, G; Kostic, Z (February 2025, in Proc. 4th IEEE Workshop on Pervasive and Resource-constrained Artificial Intelligence (PeRConAI), 2025.)

Vision Language models (VLMs) have transformed Generative AI by enabling systems to interpret and respond to multi-modal data in real-time. While advancements in edge computing have made it possible to deploy smaller Large Language Models (LLMs) on smartphones and laptops, deploying competent VLMs on edge devices remains challenging due to their high computational demands. Furthermore, cloud-only deployments fail to utilize the evolving processing capabilities at the edge and limit responsiveness. This paper introduces a distributed architecture for VLMs that addresses these limitations by partitioning model components between edge devices and central servers. In this setup, vision components run on edge devices for immediate processing, while language generation of the VLM is handled by a centralized server, resulting in up to 33% improvement in throughput over traditional cloud-only solutions. Moreover, our approach enhances the computational efficiency of off-the-shelf VLM models without the need for model compression techniques. This work demonstrates the scalability and efficiency of a hybrid architecture for VLM deployment and contributes to the discussion on how distributed approaches can improve VLM performance. Index Terms—vision-language models (VLMs), edge computing, distributed computing, inference optimization, edge-cloud collaboration.

Free, publicly-accessible full text available February 1, 2026

Search for: All records